Rank | Count | Beginning |
---|---|---|
18 | 14625 | Pada |
59 | 10527 | Ia |
40 | 8185 | Di |
37 | 5229 | Dalam |
25 | 4040 | Setelah |
48 | 2916 | Namun |
24 | 2797 | Selain |
90 | 2572 | Dengan |
20 | 2523 | Dia |
41 | 2409 | Untuk |
477 | 2257 | Mereka |
340 | 1957 | Karena |
613 | 1879 | Kota |
96 | 1726 | Hal |
67 | 1568 | Saat |
316 | 1537 | Beberapa |
28 | 1523 | Dari |
184 | 1468 | Sebagai |
342 | 1458 | Ketika |
3 | 1424 | Ibu |
31 | 1362 | Nama |
357 | 1353 | Sejak |
204 | 1334 | Ada |
75 | 1314 | Menurut |
104 | 1209 | Ini |
442 | 1159 | Para |
375 | 1138 | Kemudian |
401 | 1136 | Tahun |
103 | 1109 | Didasarkan |
202 | 1090 | Sebuah |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV